MXPlank - The Elegant Universe Of Space-Time

Connecting To The Server To Fetch The WebPage Elements!!....

MXPlank.com

Submit Research Thesis

Electronics - MicroControllers

ScienceCasts

Earth's Magnetosphere

Elucidating The Black Holes

The Surprising Power of a Solar Storm

A Close Encounter With Jupiter

Ancient remnants deep in the Kuiper belt

The Super Fluid Core Of A Dead Neutron Star

Massive Cloud On Collision Course With Milky Way

Mysterious Objects at the Edge of the Electromagnetic Spectrum

Big Mystery in the Perseus Cluster

Spacecraft discovers thousands of doomed comets

Close Encounter with Enceladus

The Sounds Of The InterStellar Space

Search The Site

The choice of the loss function is critical in defining the outputs in a way that is sensitiveto the application at hand. For example, least-squares regression with numeric outputs requires a simple squared loss of the form (y-ŷ)² for a single training instance with target y and prediction ŷ. One can also use other types of loss like hinge loss for y∈{-1,+1} and real-valued prediction ŷ (with identity activation)

The hinge loss can be used to implement a learning method, which is referred to as a support vector machine.

For multiway predictions (like predicting word identifiers or one of multiple classes),the softmax output is particularly useful. However, a softmax output is probabilistic, andtherefore it requires a different type of loss function. In fact, for probabilistic predictions,two different types of loss functions are used, depending on whether the prediction is binaryor whether it is multiway:

Binary targets (logistic regression):In this case, it is assumed that the observedvalueyis drawn from{-1,+1}, and the prediction ŷ is a an arbitrary numerical valueon using the identity activation function. In such a case, the loss function for a singleinstance with observed valueyand real-valued prediction ŷ (with identity activation)is defined as follows

This type of loss function implements a fundamental machine learning method, re-ferred to aslogistic regression. Alternatively, one can use a sigmoid activation function to output ŷ∈(0,1), which indicates the probability that the observed valueyis 1. Then, the negative logarithm of |y/2-0.5+ŷ| provides the loss, assuming that y is coded from {-1,1}. This is because |y/2-0.5+ŷ| indicates the probability that the prediction is correct. This observation illustrates that one can use various combinations of activation and loss functions to achieve the same result.

Categorical targets:
In this case, if ŷ₁...ŷ_k are the probabilities of the k classes (using the softmax activation of Equation 1.9 ), and the r^th class is the ground-truth class, then the loss function for a single instance is defined as follows:

This type of loss function implements multi nomial logistic regression, and it is referred to as the cross-entropy loss. Note that binary logistic regression is identical to multinomial logistic regression, when the value of k is set to 2 in the latter.

The key point to remember is that the nature of the output nodes, the activation function,and the loss function depend on the application at hand. Furthermore, these choices also depend on one another. Even though the perceptron is often presented as the quint essential representative of single-layer networks, it is only a single representative out of a very large universe of possibilities. In practice, one rarely uses the perceptron criterion as the loss function. For discrete-valued outputs, it is common to use softmax activation with cross-entropy loss. For real-valued outputs, it is common to use linear activation with squared loss. Generally, cross-entropy loss is easier to optimize than squared loss.